{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1 均方差损失函数：MSE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "均方误差（Mean Square Error），应该是最常用的误差计算方法了，数学公式为：\n",
    "$$loss = \\frac{1}{N}\\sum {{{(y - pred)}^2}} $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其中，$y$是真实值，$pred$是预测值，$N$通常指的是batch_size，也有时候是指特征属性个数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor([2 4 4 0 2], shape=(5,), dtype=int32)\n",
      "tf.Tensor(\n",
      "[[0. 0. 1. 0. 0.]\n",
      " [0. 0. 0. 0. 1.]\n",
      " [0. 0. 0. 0. 1.]\n",
      " [1. 0. 0. 0. 0.]\n",
      " [0. 0. 1. 0. 0.]], shape=(5, 5), dtype=float32)\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "y = tf.random.uniform((5,),maxval=5,dtype=tf.int32)  # 假设这是真实值\n",
    "print(y)\n",
    "\n",
    "y = tf.one_hot(y,depth=5)  # 转为热独编码\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=7, shape=(5, 5), dtype=float32, numpy=\n",
       "array([[0., 0., 1., 0., 0.],\n",
       "       [0., 0., 0., 0., 1.],\n",
       "       [0., 0., 0., 0., 1.],\n",
       "       [1., 0., 0., 0., 0.],\n",
       "       [0., 0., 1., 0., 0.]], dtype=float32)>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor(\n",
      "[[0. 1. 0. 0. 0.]\n",
      " [0. 0. 0. 1. 0.]\n",
      " [1. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 1. 0.]\n",
      " [0. 0. 0. 0. 1.]], shape=(5, 5), dtype=float32)\n"
     ]
    }
   ],
   "source": [
    "pred = tf.random.uniform((5,),maxval=5,dtype=tf.int32)  # 假设这是预测值\n",
    "pred = tf.one_hot(pred,depth=5)  # 转为热独编码\n",
    "print(pred)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=19, shape=(), dtype=float32, numpy=0.4>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "loss1 = tf.reduce_mean(tf.square(y-pred))\n",
    "loss1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在tensorflow的losses模块中，提供能MSE方法用于求均方误差，注意简写MSE指的是一个方法，全写MeanSquaredError指的是一个类，通常通过方法的形式调用MSE使用这一功能。\n",
    "MSE方法返回的是每一对真实值和预测值之间的误差，若要求所有样本的误差需要进一步求平均值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=22, shape=(5,), dtype=float32, numpy=array([0.4, 0.4, 0.4, 0.4, 0.4], dtype=float32)>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "loss_mse_1 = tf.losses.MSE(y,pred)\n",
    "loss_mse_1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=24, shape=(), dtype=float32, numpy=0.4>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "loss_mse_2 = tf.reduce_mean(loss_mse_1)\n",
    "loss_mse_2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一般而言，均方误差损失函数比较适用于回归问题中，对于分类问题，特别是目标输出为One-hot向量的分类任务中，下面要说的交叉熵损失函数就要合适的多。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2 交叉熵损失函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "交叉熵（Cross Entropy）是信息论中一个重要概念，主要用于度量两个概率分布间的差异性信息，交叉熵越小，两者之间差异越小,当交叉熵等于0时达到最佳状态，也即是预测值与真实值完全吻合。先给出交叉熵计算公式："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$H(p,q) =  - \\sum\\limits_i {p(x)\\log q(x)} $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其中，$p(x)$是真实分布的概率，$q(x)$是模型通过数据计算出来的概率估计。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不理解？没关系，我们通过一个例子来说明。假设对于一个分类问题，其可能结果有5类，由$[1,2,3,4,5]$表示，有一个样本$x$，其真实结果是属于第2类，用One-hot编码表示就是$[0,1,0,0,0]$，也就是上面公司中的$p(x)$。现在有两个模型，对样本$x$的预测结果分别是$[0.1, 0.7, 0.05, 0.05, 0.1]$ 和 $[0, 0.6, 0.2, 0.1, 0.1]$，也就是上面公式中的$q(x)$。从直觉上判断，我们会认为第一个模型预测要准确一些，因为它更加肯定$x$属于第二类，不过，我们需要通过科学的量化分析对比来证明这一点："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一个模型交叉熵：${H_1} =  - (0 \\times \\log 0.1 + 1 \\times \\log 0.7 + 0 \\times \\log 0.05 + 0 \\times \\log 0.05 + 0 \\times \\log 0.01) =  - \\log 0.7 = 0.36$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第二个模型交叉熵：${H_2} =  - (0 \\times \\log 0 + 1 \\times \\log 0.6 + 0 \\times \\log 0.2 + 0 \\times \\log 0.1 + 0 \\times \\log 0.1) =  - \\log 0.6 = 0.51$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可见，${H_1} < {H_2}$，所以第一个模型的结果更加可靠。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在TensorFlow中，计算交叉熵通过tf.losses模块中的categorical_crossentropy()方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=41, shape=(), dtype=float32, numpy=0.35667497>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf.losses.categorical_crossentropy([0,1,0,0,0],[0.1, 0.7, 0.05, 0.05, 0.1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=58, shape=(), dtype=float32, numpy=0.5108256>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf.losses.categorical_crossentropy([0,1,0,0,0],[0, 0.6, 0.2, 0.1, 0.1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型在最后一层隐含层的输出可能并不是概率的形式，不过可以通过softmax函数转换为概率形式输出，然后计算交叉熵，但有时候可能会出现不稳定的情况，即输出结果是NAN或者inf，这种情况下可以通过直接计算隐藏层输出结果的交叉熵，不过要给categorical_crossentropy()方法传递一个from_logits=True参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = tf.random.normal([1,784])\n",
    "w = tf.random.normal([784,2])\n",
    "b = tf.zeros([2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=75, shape=(1, 2), dtype=float32, numpy=array([[ 5.236802, 18.843138]], dtype=float32)>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logits = x@w + b  # 最后一层没有激活函数的层称为logits层\n",
    "logits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=77, shape=(1, 2), dtype=float32, numpy=array([[1.2326591e-06, 9.9999881e-01]], dtype=float32)>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prob = tf.math.softmax(logits, axis=1)  # 转换为概率的形式\n",
    "prob"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=112, shape=(1,), dtype=float32, numpy=array([1.1920922e-06], dtype=float32)>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf.losses.categorical_crossentropy([0,1],logits,from_logits=True)  # 通过logits层直接计算交叉熵"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=128, shape=(1,), dtype=float32, numpy=array([1.1920936e-06], dtype=float32)>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf.losses.categorical_crossentropy([0,1],prob)  # 通过转换后的概率计算交叉熵"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tensorflow_gpu",
   "language": "python",
   "name": "tensorflow_gpu"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
